Information Bottleneck Co-clustering
نویسندگان
چکیده
Co-clustering has emerged as an important approach for mining contingency data matrices. We present a novel approach to co-clustering based on the Information Bottleneck principle, called Information Bottleneck Co-clustering (IBCC), which supports both soft-partition and hardpartition co-clusterings, and leverages an annealing-style strategy to bypass local optima. Existing co-clustering methods require the user to define the number of rowand column-clusters respectively. In practice, though, the number of rowand column-clusters may not be independent. To address this issue, we also present an agglomerative Information Bottleneck Co-clustering (aIBCC) approach, which automatically captures the relation between the numbers of clusters. The experimental results demonstrate the effectiveness and efficiency of our techniques.
منابع مشابه
A fuzzy co-clustering algorithm for biomedical data
Fuzzy co-clustering extends co-clustering by assigning membership functions to both the objects and the features, and is helpful to improve clustering accurarcy of biomedical data. In this paper, we introduce a new fuzzy co-clustering algorithm based on information bottleneck named ibFCC. The ibFCC formulates an objective function which includes a distance function that employs information bott...
متن کاملCo-Clustering via Information-Theoretic Markov Aggregation
We present an information-theoretic cost function for co-clustering, i.e., for simultaneous clustering of two sets based on similarities between their elements. By constructing a simple random walk on the corresponding bipartite graph, our cost function is derived from a recently proposed generalized framework for information-theoretic Markov chain aggregation. The goal of our cost function is ...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملInformation Bottleneck for Non Co-Occurrence Data
We present a general model-independent approach to the analysis of data in cases when these data do not appear in the form of co-occurrence of two variables X,Y , but rather as a sample of values of an unknown (stochastic) function Z(X,Y ). For example, in gene expression data, the expression level Z is a function of gene X and condition Y ; or in movie ratings data the rating Z is a function o...
متن کاملAn Analysis of Model-based Clustering, Competitive Learning, and Information Bottleneck
This paper provides a general formulation of probabilistic model-based clustering with deterministic annealing (DA), which leads to a unifying analysis of k-means, EM clustering, soft competitive learning algorithms (e.g., self-organizing map), and information bottleneck. The analysis points out an interesting yet not well-recognized connection between the k-means and EM clustering—they are jus...
متن کامل